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For data-centric systems, provenance tracking is particularly important when the system is open and 
decentralised, such as the Web of Linked Data. In this paper, a concise but expressive calculus 
which models data updates is presented. The calculus is used to provide an operational semantics 
for a system where data and updates interact concurrently. The operational semantics of the calculus 
also tracks the provenance of data with respect to updates. This provides a new formal semantics 
extending provenance diagrams which takes into account the execution of processes in a concurrent 
setting. Moreover, a sound and complete model for the calculus based on ideals of series-parallel 
DAGs is provided. The notion of provenance introduced can be used as a subjective indicator of the 
quality of data in concurrent interacting systems. 

1 Introduction 

There is a growing trend to publish data openly on the Web. This movement is gaining significant 
momentum as the governments of several countries and numerous other organisations adopt common 
principles for publishing data Q. Data published according to these principles is referred to as Linked 
Data, due to the use of URIs to establish links between published data. By establishing links between 
arbitrary data sets, significant problems emerge that are of a different flavour to those associated with 
traditional closed databases. 

Many of the new problems which emerge in this scenario are due to the the decentralised nature of 
the published data. Some significant challenging problems include: the efficient execution of distributed 
queries and processes which exploit multiple data sources; the impossibility of enforcing a global schema 
on data; the lack of boundaries for data ensuring the impossibility of complete results; and establishing 
global standards for data formats and protocols. 

This work considers another essential problem, which reflects the diversity of published data. The 
challenge considered here is that each piece of data published has a varying degree of trust or relevance. 
A user may consider data published by the BBC to be more trustworthy than data published on a personal 
blog. However, if the blog is run by a political activist that the consumer of data approves of, then the 
blog may be more relevant. Thus data should not be associated with a specific trust measure. Instead, 
some extra information about the data should be tracked, i.e. the provenance of data. From the extra 
information provided by the provenance of data, the consumer may judge the quality of the associated 
data according to their own policy. 

Provenance can track several characteristics of the origin of data. Characteristics include "who" has 
influenced the data, "where" the data has been located, and "how" the data is produced Q. For Linked 
Data, a basic notion of "where" provenance called a named graph, which indicates where the data is 
located now, is the recognised standard [5 ]. In related work, a model extending named graphs is used 
to track more comprehensive "where" and "who" provenance ©. The related work associates trees of 
identifiers for agents and locations with data. This allows a history of where the data has been published 
and who published it to be tracked. 
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This work focuses on a form of "how" provenance. This form of "how" provenance tracks causal 
relationships between stored data and data that was used to produce the data |6j. For instance, due to 
a change in usage of a building, data about the building may be updated. The updates may replace or 
extract information from the original data. Thus "how" provenance can be used to determine how old 
data influenced the new data with respect to an update. 

The notion of "how" provenance investigated is strongly related to event based models of causal- 
ity [3] [GO. This model clarifies, for the first time, the relationship between concurrent process and the 
provenance diagrams that they produce. An operational semantics formalises the operational behaviour 
of processes while recording the provenance associated with the resulting data. The model presented 
provides insight that may be used to refine the definition of provenance diagrams. Provenance diagrams 
that arise from concurrent updates are guaranteed to be in a particular (series-parallel) form. This insight 
is a contribution to the effort to establish a common notion of a provenance diagram fi31 . Furthermore, 
the model presented is proven to be sound and complete. The formal model provides a foundation for in- 
vestigating problems associated with tracking and exploiting the provenance of data, including querying 
provenance EKTJ, and employing trust metrics iflOl . 



2 Causal Dependencies in Provenance Diagrams 

This work focusses on a particular aspect of provenance tracking. The aspect considered is a form of 
"how" provenance, which indicates how old data contributed to producing new data. The consensus in 
the provenance community is that provenance diagrams which record this information form a directed 
acyclic graph (DAG), where the edges are transitively closed. A standard format for representing prove- 
nance, called the Open Provenance Model lfT5ll . encompasses this notion of provenance. The informal 
definitions provided by the standard are as follows. 

For this work, artefacts are data tuples. The was derived from relation between artefacts is such that 
if there is an edge from artefact [dq \ to artefact [(h) , then there is a causal relationship that indicates that 

(d\) needs to have been generated to enable {d^j to be generated. The standard defines a multi-step was 
derived from relation. This is simply the transitive closure of the was derived from relation, indicating 
that an artefact had an influence on another artefact. 

A provenance diagram that indicates the provenance of two stored pieces of data, where the stored 
data is indicated by an over line, is presented in Fig. [TJ The example is used throughout this work and 
concerns monuments adjacent to the venue of the workshop. 



[(Turner loc Tate)} 
t 

[ Turner loc Baltic) 
t 

( Turner loc Tate ) 



{(Turner loc London ) 

T 




(Turner loc UK) 



Figure 1: The Turner Prize is held at the Tate Britain in London. However, in 2011 it was held in The 
Baltic Gallery in Gateshead, but returned to the Tate Britain in 2012. The data in the above diagram is 
about the location of the Turner Prize. Edges are causal relationships indicating the data consumed to 
produce new data as the location of the Turner Prize is updated. 
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3 A Syntax and Semantics for Provenance Tracking Data Updates 

This section introduces the syntax and semantics for a concurrent interacting system that tracks prove- 
nance. The provenance community have introduced provenance structures based on DAGs; therefore a 
process model which gives rise to DAGs is considered |[T5l . Unfortunately, many models of concurrency 
are based on traces or trees rather than DAGs, such as in the calculus and provenance structures intro- 
duced in [18]. This limitation is addressed by providing a non-interleaving semantics, inspired by 19] . 

3.1 An Abstract Syntax for Processes 

The grammar for processes is provided in Fig. [2] The concepts are summarised and made precise by the 
operational and denotational semantics presented in this work. 





P:.= I 


skip 


a name x variable 


1 d 


consume data 




1 d 


stored data 


Av.= x\a variable or name 


1 


artefact 




1 P;P 


seq 




1 p\p 


par 


d:\-A\AA\ AAA | . . . data tuple 


I p®p 


choose 




| 3x.P 


exists 



Figure 2: The syntax of processes. 



The data tuples. The basic unit of information considered in this work is a tuple of names. Tuples are 
commonly used to convey data. Linked Data is based on RDF which involves triples of names |[2l[T3l. 
RDF makes use of URIs for names, since URIs provide a globally recognised naming system. In Linked 
Data, often RDF triples are extended to quadruples of URIs where the extra URI indicates where the triple 
is located f3J. This provides a basic notion of "where" provenance. This notion of "where" provenance 
is extended in [8]. 

The artefacts. A data tuple can be stored, represented as d. Stored data can then be consumed in an 
interaction with the process d. The result is an artefact [j] used to explicitly track interactions which 
have occurred. An artefact is used to record a data tuple involved in an interaction. Artefacts are used to 
capture "how" provenance. 

The multiplicatives. There are two multiplicative operators. The "par" multiplicative represents the 
parallel composition of processes where interactions between processes are permitted. The "seq" multi- 
plicative represents the strict sequential composition of processes, where the first process must terminate 
before the second process begins, hence the second process is causally dependent on the first process. 
There is one unit for the multiplicatives, namely skip, which represents a successful action with no side 
effects. 
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UQ = Q',l = Q\l = Q P\(Q\R) = (P\Q)\R P ;{Q;R) = {P ;Q);R Q\R = R\Q 

(P®Q)®R = P®(Q®R) P®Q = Q®P P®P = P 

(P®Q);R = {P;R)®(Q;R) P ;(Q®R) = {P ;g)©(P ;R) (P®Q)\R = (P\R)®{Q\R) 

3x.(P®Q) = 3x.P®3x.Q 3x1 = 1 

3x.(P\S) = 3x.P\S 3x.(S ; Q) = S ; 3x.Q 3x.(P ; S) = 3x.P ; S 

where S is a process where x does not appear free 

Figure 3: The structural congruence, which can be applied at any point in a derivation. 

The additives. There are two additives: © represents a choice between two branches; 3 represents a 
choice between all possible name substitutions for the bound variable. 

3.2 Operational Semantics of Processes 

Deductive systems are typically presented using inference rules applied at the base of a syntax tree, as in 
the sequent calculus. However, such systems are unsuited to systems which mix commutative and non- 
commutative operators (TT). For this reason, a deep inference style of presentation is adopted, where 
inference rules can be applied at an arbitrary depth in a formula. 

A structural congruence which extends a-conversion is introduced, in Fig. |3j which is used to rear- 
range processes. The structural congruence ensures that the order of composition matters for sequential 
composition, but does not matter for parallel composition. For simplicity, both parallel composition and 
sequential composition share the same unit. The structural congruence handles contraction for choice, 
using idempotency. The other rules of the structural congruence determine how operators distribute over 
each other. Distributivity properties are used in related models of concurrency (9JQ21. Note that this 
selection of rules is not minimal; however they are used in Sec.|5]to rewrite processes into normal forms, 
thereby simplifying the completeness proof. 

A deductive system is presented in Fig. |4] Deductions may be applied at any depth in a process, 
as with the structural congruence. Deductions are presented with the premise above the line and the 
conclusion below the line. 



(7] (P\Q);(P'\Q') p , /TO 

==- interact sequence choice exists 

d\d (P;P')\(Q;Q') P®Q 3xP 

Figure 4: The deductive system for processes. All deductions can be applied in any context. 



The interact rule. The interact rule only applies to tuples. The rule indicates that a stored tuple is 
consumed by the process which deletes that triple. The result of the interaction is an artefact that records 
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the consumed tuple. 



The sequence rule. The sequence rule reorders processes composed in parallel. The premise is more 
deterministic than the conclusion. The premise decides which part of the process will execute first; 
whereas the conclusion leave open several other opportunities. This rule allows parts of a process to travel 
to the intended location where they will interact. This rule appears in related models of concurrency (9j 
MM- 

The additives. The premises of the additives indicate the branch that is chosen. For choice, either the 
left or the right branch is chosen. For exists, any name may be substituted for the bound variable. This 
kind of choice is known as external choice in process calculi, where exists is an infinite external choice. 



4 A Process Calculus for Provenance Tracking Updates 

This section identifies a sub-grammar of processes that model certain systems. The systems modelled 
are those which involve stored data composed in parallel with updates. The updates involve the removal 
of some stored data satisfying a query, followed by the insertion of some new stored data. 

The operational semantics for processes are provided by the rules of the system in the previous 
section. A system can evolve to a given state if and only if the new state entails the original state. Notice 
that implication is in the opposite direction to the evolution of the system. The direction of implication 
is in line with related approaches to operational semantics lfT4l . 



Data ::= I 
| d 

Data | Data 

Query ::= I 
I d 

| Query | Query 
| Query © Query 
3x. Query 



Update ::= Query ; Data 

| Update © Update 
3x. Update 

System ::= I 




| Update 

I d 

| System ; System 

| System | System 



Figure 5: Sub-algebras of processes for data, queries, updates and systems. 



Data. Data simply represents zero or more stored data atoms. The following presents two stored triples 
in RDF format, which consist of three URIs: the subject, property, and object. 

(Sage rdfitype Concert Mali) \ (Baltic rdfitype ArtJGallery) 



Note that all names are active URIs which link to real published Linked Data. The reader is invited to 
follow the URIs to witness the examples in a real context. 
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Queries. Parallel composition | and choice © are exploited to model the following queries. As in |fT3l , 
the existential quantifier is used to select URIs which occur in data. The following pattern uses choice 
to select between two objects. This example discovers a concert hall located in either Newcastle or 
Gateshead. 

3x.((x rdf: tvpe~ $Concert-Hall\) \ ((x \loc\Newca stle ) 8 (x loc Gateshead^)) 

Note that a tighter operational semantics could be provided by using a tensor product to join queries IfTBl . 
A tensor product ensures both parts of a query are answered atomically. Unfortunately, the calculus for 
Linked Data in [13] has an interleaving semantics, which would give rise to trees of provenance diagrams 
as in Ifl8l . Future work would be to combine the strengths of both calculi. 



Updates. The following is an example of an update which applies to some stored data. The existential 
quantification discovers a name which is used in the delete statement and the data stored after the delete. 
The Baltic Art Gallery is a converted flour mill. The update turns a depiction of the old flour mill into a 
depiction of the new art gallery. 

(Mill depiction photo) \ 3x.((Mill depiction x) ; (Baltic depiction x)} 

The above process is provable from the following process, using the exists, sequence and interact rules. 
This means that the system above can evolve to the system below. 

[(Mill depiction photo)} ; (Baltic depiction photo) 

Notice that the original stored triple appears as an artefact, which the new triple is dependent on. This 
provides "how" provenance that indicates the old triple used to create the new triple. 



Distinctions between execution paths. There are multiple ways of evaluating processes. Different 
methods of evaluation can give rise a different provenance. Here three distinct executions of the same 
process are presented to demonstrate the complexity of provenance tracking in a concurrent setting. 

An example which involves two updates executed in parallel is presented below. It is a common 
misconception that The Sage and Baltic Art Gallery are prominent monuments in Newcastle. In reality 
they are located in Gateshead on the opposite bank of the river TyneQ. The updates transform the location 
of these monuments from Newcastle to Gateshead. 

(Sage loc Gateshead^ | ( <\Sage]loc\Gateshe~aa% ; (Sage loc Newcastle}} \ 
(Baltic loc Gateshead) \ ((Baltic loc Gateshead) ; (Baltic loc Newcastle)} 

The process below yields the process above, using the sequence rule. The two updates occur indepen- 
dently, hence each provenance is independent. 



( [(Sage loc Gateshead)} ; (Sage loc Newcastle)} \ 
( ((Baltic loc Gateshead)} ; (Baltic loc Newcastle)} 

The process below yields both process above. This suggest that the two updates were combined before 
they were applied, hence data produced by each update is dependent on the artefact of the other update. 
Therefore the process below has stronger dependencies than the process above. 

( [(Sage loc Gateshead)} \ ((Baltic loc Gateshead )[ } ; ((Sage loc Newcastle) \ (Baltic loc Newcastle)} 

Indeed the above process can be refined further to impose a sequential dependency on the artefacts. Thus 
the execution of the concurrent processes greatly affects the form of "how" provenance. 



Indeed the venue of FOCLASA 2012 is also in Gateshead, rather than in Newcastle. 
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The Turner Prize revisited. The operational behaviour which gives rise to the provenance diagram in 
the introduction can now be expressed. The initial configuration is expressed below. It shows two stored 
triples, an update that moves the exhibition from the Tate Britain to the Baltic and broadens London to 
the UK, and an update which moves the exhibition back from the Baltic to the Tate Britain. 

( Turner loc London ) | ( Turner loc Tate ) \ 

( <\Turner\loc\Tate\) | (I TurnerUoc \London\ ) ; (( Turner loc Baltic!) \ ^Turner loc UK)} \ 
( <iTurner\loc\Balticll ; ^Turner loc Tate )} 

By applying the sequence rule several times the processes can be rearranged as follows. 

I d TurnerUoc \Tatei | ( Turner loc Talei \ ^Turner loc London ) \ iTurner\loc\LondohV j ; 
([(Turner loc Baltic) \ (Turner loc Baltic)} ; (Turner loc Tate)} \ (Turner loc UK)} 

Finally, by applying the interact rule the delete operations and stored data cancel each other out. The 
interaction produce the artefacts that record the provenance of the data. 

( ((Turner loc Tate')} \ [(Turner loc London)^ } ; 

(([ (Turner loc Baltic) } ; (Turner loc Tate)} \ (Turner loc UK)} 

The next section provides a denotational semantics where the denotation of above process is exactly the 
provenance diagram in the introduction. 



5 A Denotational Semantics for the Provenance Tracking Calculus 

This section provides a denotational semantics for the calculus. A denotational semantics provides a 
sound and complete model which increases confidence in the definition of the calculus. In this case, the 
semantics of the calculus fulfils an additional purpose. It also makes explicit the connection between cer- 
tain terms of the calculus and provenance diagrams. Furthermore, a restriction on provenance diagrams 
that track series-parallel computations is highlighted. 

The denotational semantics, similarly to provenance diagrams, is based on directed acyclic graphs 
(DAGs). The denotation relies on some technical apparatus. Firstly, DAGs are restricted by a forbidden 
minor property, which guarantees that each DAG arises from applying series and parallel composition 
to smaller DAGs. Secondly, homomorphisms between DAGs are defined such that the inference rules 
of the calculus hold. By taking ideals of series-parallel DAGs with respect to these homomorphism, a 
sound and complete model is obtained. 



5.1 Series-Parallel DAGs and the N-free Condition 

This section recalls some standard definitions which are used to build a denotational semantics. The 
definition of a DAG is standard, as are the definitions of the transitive closure of a graph and the notion 
of a graph homomorphism. Transitive DAGs are used because provenance diagrams are transitive, and 
graph homomorphism are used to compare the structure of such diagrams. 

Definition 5.1. A DAG D = (V,E) is a digraph with no directed cycles. Let A = (V, E) and B - ( V, E') be 
graphs. A graph homomorphism is given by a function on vertices f: V — > V such that if(u,v) e E then 
(/(«), /(v)) £ E' . Two graphs are isomorphic iff there exists a bijective homomorphism whose inverse 
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function is also a homomorphism. A transitive digraph is such that if there exists a path from u to v, then 
there exists an edge from u to v. A transitive closure of a digraph (V,E) is a minimal transitive digraph 
(V,E r ) such that there exists an injective graph homomorphism from (V,E) to (V,E'). A graph (V,E) is a 
sub-graph (V',E') if and only ifVQV'andE = E'nVxV. 

Several series-parallel digraphs are studied in [19]. Here transitive series-parallel DAGs are defined. 
The series-parallel restriction on transitive DAGs is required because this work considers provenance 
diagrams which arise from the execution of series-parallel processes. 

Definition 5.2. The trivial DAG with no vertices is a series-parallel DAG, and the DAG with a single 
vertex and no edges is a series-parallel DAG. If Go = (Vq,Eo) and G\ = {V\,E\) are series-parallel 
graphs with disjoint vertices, then the following are series-parallel DAGs. 

• G \\G 1 defined by (V UVi,E UEi). 

• Go ; Gi defined as the transitive closure of(VoUVi,EoUE\U(Lx R)), where L is the source nodes 
of Go and R is the sink nodes ofG\. 

In structural graph theory it is studied how graph classes either can be defined by forbidden minors, 
or by being glued together from simple starting graphs (as in the definition above). A forbidden minor 
is a sub-graph with a particular form; the forbidden minor for series-parallel DAGs has an A^-shape, as 
proven in lfl9l . 

Theorem 5.3 (Forbidden minor). A transitive DAG is series-parallel if and only if it does not have a 
sub-graph isomorphic to N = ({vo,Vi,v 2 ,V3},{(v 2 ,vo),(v3,Vo),(v3,vi)}). 

Notice that use of transitive DAGs is motivated, by provenance diagrams; while the series-parallel 
restriction is motivated by concurrent processes. Thus the model studies structures which respect both 
provenance and the processes which track the provenance. 

5.2 Interacting Series-Parallel DAGs Labelled with Data 

The notion of a series-parallel DAG is extended with labels. The labels allow data to be accommodated 
in the model. Also the notion of a homomorphism is extended to allow interactions between data and 
operations on data which give rise to artefacts. 

The definition of a labelled graph is standard. A special kind of homomorphism is defined on labelled 
DAGs. This smoothing homomorphism is bijective, but does not define an isomorphism. Thus vertices 
are preserved, but extra edges may appear. 

Definition 5.4. Fix S as the set of labels which are either tuples d, stored tuples d or artefacts [d^ A 
labelled graph (V,E,p) is such that (V,E) is a graph and p: V ->Sis a labelling function from vertices 
to labels. Let A - (V,E,p) and B - (V',E',p') be labelled DAGs. A labelled homomorphism f from A to 
B is such that f is a graph homomorphism from (V,E) to (V' ,E') and for all vertices u, p(u) = p'(f(u)). 
A smoothing homomorphism is a bijective labelled homomorphism. 

The notation of a smoothing homomorphism defined above is used to characterise the sequence rule. 
To capture both the sequence rule and the interact rule, interaction homomorphisms are introduced. The 
definition involves a coherence condition which captures the conditions under which an interaction may 
occur. Two vertices can interact if they have complementary labels and they are in parallel with each 
other. This leads to the following definition. 
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Definition 5.5. For a labelled graph A = (V,E,p), define u v in A such that there is no directed path 
between u and v, and either d - p(u) = p(v) or p(u) = /i(v) - d. Let A = (V,E,n) and B = (V',E',p') be 
labelled DAGs. An interaction homomorphism f from A to B is a labelled graph homomorphism such 
that f is onto and, if f(u) - f(v), one of the following hold: either u ^ v in A and p'(f(u)) - [d\- or 
u — v and p(u) = fi' (f(u)). 

The following example demonstrates two compatible vertices mapped to the same vertex by an in- 
teraction homomorphism. 



abb 
\ f I 

d d 
b a 



abb 


b ~d 



Note that the diagrams in examples represent equivalence classes of labelled graphs up to labelled graph 
isomorphism. Thus only the labels and not the underlying vertices are indicated. The same practice is 
followed when presenting provenance diagrams. 

The homomorphisms defined over labelled DAGs are used to generate ideals. Ideals are sets of 
labelled series-parallel DAGs closed with respect to either smoothing or interacting homomorphisms. 

Definition 5.6. A smoothing/interacting ideal I is a set of labelled series-parallel DAGs such that if A e / 
and there exists a smoothing/interacting homomorphism f : A — > B, then Bel. For any set of labelled 
series-parallel DAGs P the smoothing/interacting ideal closure of P, denoted L s P/ijP, is the least ideal 
containing P, defined as the intersection of all smoothing/interacting ideals I such that P Ql. 

These ideals are employed to denote processes in the next section. Ideal closure is essential for the 
denotation of parallel composition. 



5.3 Correctness of the Denotational Semantics 

The denotational semantics for processes is defined using the ideals introduced in the previous section. 
Most operations on ideals are the obvious point-wise extension of the corresponding operator. The main 
subtlety is that parallel composition introduces new possibilities for both smoothing and interaction, 
which are not represented by the point-wise parallel composition of ideals. Thus the ideal closure is em- 
ployed to denote parallel composition. Valuations are used to represent substitutions, which are required 
to denote existential quantification. 

Definition 5.7 (denotation). A valuation v is a mapping from variables to names. Let v[x h-» a] be the 

valuation which is the same as v except at x where it maps to a. The effect of a valuation on a label is 
defined as follows. 

(7f=© {df =7F (Ao..A n ) v = A V ..A V „ a v = a x v = v(x) 

The denotation of a process with respect to a valuation v satisfies the following, where h e {s,i}, e is the 
set containing the empty labelled graph, and e(l, v) is the equivalence class of labelled graph with one 
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vertex labelled with l v with respect to labelling isomorphism. 

lv,lj h = e lv,n h = e{l,v) lv,3xP] h = [j Mx^a],PJ h 

aeNames 

lv,P®Q\ = lv,P% U h',Qh lv,P ; Q\ = {A;B\(A,B)e lv,P\ x h>,Q\) 

lv,P\\ Q\ = i h {A\\B\ (A,B) e lv,PJ h x [v, Q\) 

All the operations used in the denotational semantics preserve ideals, as verified by the following 
proposition. Therefore the denotational semantics is a well defined mapping from processes to ideals. 

Proposition 5.8. The following are ideals: e, e(l,v), the union and intersection of sets of ideals, and the 
point-wise sequential composition of ideals. 

Soundness of the calculus defined in Sec. [3] with respect to the denotation is straight forward. The 
proof follows from checking that all equations of the structural congruence hold as set equality of ideals, 
and that all deductive rules hold as set inclusions of ideals. 

Theorem 5.9 (soundness). If P yields Q, then, |[v,P]| ; - c Qv, Q\\for all valuations v. 

Completeness of the calculus with respect to the denotation is more challenging. The proof follows 
from interpolation lemmas. An interpolation lemma establishes that if there is a strict inclusion between 
the denotation of processes then there must be a finite sequence of deductions that can be applied to 
transform one process into the other process. The trick is to rewrite processes into a normal form and 
deal with each deductive rule one by one. 

Firstly consider series-parallel terms, which are processes which does not feature any choice or exists. 
Two interpolation lemmas apply to series-parallel terms. The first interpolation lemma, stated below, 
deals only with the sequence rule. This lemma is closely related to the interpolation lemma established 
in |9), where a similar calculus without interactions is considered. Thus only smoothing ideals are 
treated. 

Lemma 5.10 (sequence interpolation). Given two series-parallel terms P and Q, // Dv,PJJ v c |[v, QJ s for 
all valuations v, then either: |[v,Pl s - flv, Q\ s for all valuations v; or there exists R such that Ev,PTJ v c 
Iv,/?]| s c |[v, Q\ s for all valuations v, and P yields R is provable using only the sequence rule. 

The above result is extended to interacting homomorphism in the following interpolation lemma. 
The proof of this lemma is an important technical contribution of this work. It shows that, for any strict 
inclusion between the denotation of series-parallel process, either the sequence rule or the interact rule 
can be applied. 

Lemma 5.11 (interaction interpolation). Given two series-parallel terms P and Q, iffv,PJi £ [v, 

all valuations v, then: either |[v,PTJ s . c |[v, QJ S for all valuations v; or there exists R such that |[v,P]],- c 

lv,/?]], c |[v, QJjfor all valuations v and P yields R is provable using only the interact rule. 

Proof. Assume that P and Q are series-parallel terms such that |[v, PJ,- c |[v, QJ t for all valuations v. 
Also assume that Iv,PTJ v <£ |[v, Q\ s for some valuation v. Since P, Q are series-parallel terms, there exist 
series-parallel DAGs Do = (Vo,Eo,po), D\ - {V\,E\,p\) such that i,-Do = |v,P]] ; - and t;Di = |[v, QJp Also, 
since |[v, PTJ,- c |[v, QJ h there exists an interacting homomorphism /: D\ Do. 

There must be at least one interaction in the homomorphism / exhibited above, i.e. there exists 
m,n e V\ such that f(m) = f(n), m ^ n and f(m) = w e Vq such that /io(w) = [rfj. Suppose otherwise, 
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then for all m,n € V\ if f(m) - f(n) then m = n, and so / is bijective, since interacting homomorphisms 
are surjective. Hence / is a smoothing homomorphism from D\ to Do, so Dv,/ 5 ]^ c |[v, QJ S contradicting 
the above assumption. 

A DAG D2 = (V2,E2,/iz) is constructed to differ from Do only by the interaction exhibited by 
/. Firstly, take Vo, remove vertex w and include vertices m and n, so V2 = Vo \ {w\ U {m,n\. Let 
£o\w be the set of edges in Eq without the vertex w and define E2 = (Eq \ w) U {(x, m) \ (x, w) € Eq} U 
{(m,x) I (w,x) € Eq} U {(x,n) \ (x,w) € Eq} U {(n,x) | (w,x) e Eq}. Retain all the labels of hq except at m 
and n, so if x - m or x = n then ^2 00 = j"i00 an d otherwise ^00 = l^o(x). 

Construct two homomorphisms from g: D2 —> Dq and h: Di — > D2 as follows. 



Clearly f = goh. Furthermore, both g and h are interacting homomorphisms by the following arguments. 
Check that g is a graph homomorphism, by case analysis. Only one case is presented. By definition of 
E2, if (m,x) € £2 then (m,x) € {(m,x) \ (w,x) e Eq}, thus (g(m),g(x)) = (w,x) e Eq. Also check that g is 
an interaction homomorphism, as follows: If g(x) = g(y) then either x = y, or x = m and y = n. Clearly, 
m and n are not connected in E2 and both ^{m) = /ui(ni) and /U2(n) = H\{n) hold, so m n in D2 
and fio(f(m)) - hq{w) - [rfj. Check that /z is a graph homomorphism. Only one case is presented. If 
(m,x) € £1 then (w,f(x)) e £0 since / is a graph homomorphism, thus (m,f(x)) € {(m,x) \ (w,x) € ^o) 
so (m,f(x)) e E%, by definition. Now consider when h(x) = h(y) either x = y or x,y <f. {m,n}, hence 
/W = f(y)-> thus ^ y since / is an interacting homomorphism. Suppose, without loss of generality, 
that x = m and y £ {m,n}, so m = f(x), but m Vq contradicting the definition of /. Thus h is an interaction 
homomorphism. 

Furthermore, the constructed DAG, D2, is series-parallel. Suppose otherwise, then there exists an 
Af-shape isomoiphic to a sub graph of D2. Now consider the image of the A^-shape under g. Either zero 
or one nodes in the A^-shape are m or n so the image of the N shape is an A^-shape in Do. By Theorem l5.3l 
this contradicts the fact that Do is series-parallel. Now, suppose that both m and n are in the A-shape. 
Since m n in D2, m and n are not connected, so an A-shape must be of the form {(m,x),(n,x),(n,y)} 
or {(x,m),(x,n),(y,ri)}. However (m,x) e D2 iff (w,x) e Dq iff (n,x) e D2 and (x,m) € D2 iff (x,w) e Dq 
iff (x,n) e D2, so neither shapes are sub-graphs of D2. Thus D2 is A-free, hence by Theorem [531 D2 is a 
series-parallel DAG. 

Since, D2 is a series-parallel DAG, there exists a series-parallel term R such that [[v, /?]],• = i\D2- Since 
g : D2 — > Do exhibiting an interaction and h: D\ —> D2, the following inequalities hold Hv,P]|,- c [[v,/?]] z 
and Iv, /?]],■ c |[v, QJ f . Since /i2(w) = [dj, the sub-term must appear in the process P - <S{ [j] J, for 
some context <S{ }. Also, since m ^ n and, the edges of Do differs from those of D2 only in that the 
edges connected to w in Do are instead connect to both m and n in D2, the following holds R = S{ d \\ d }. 
Thus the interact rule proves that P yields R, as required. □ 

To clarify the significance of the interpolation lemmas consider the running example. The initial 
configuration of processes is denoted by the following DAG (Di). 



D\ : (Turner loc Tate) (Turner loc Tate) (Turner loc London) (Turner loc Baltic) 






( Turner loc London ) 



( Turner loc Baltic ) 



( Turner loc UK) 
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There exists an interaction homomorphism from D\ to the DAG D below, which appears also in Sec. |2] 



Do : [ (Turner loc Tate) } [( Turner loc London ) 

[(Turner loc Baltic)} (Turner loc UK) 



( Turn er\locXTate\ 



Now, by applying Lemma 15.111 three times, we can construct a series of DAGs cumulating in D2 pre- 
sented below, such that the following properties hold. There exist interaction homomorphisms from D\ 
to D 2 and from D 2 to D , and the process denoted by D yields the process denoted by D 2 using the in- 
teract rule three times. Furthermore, the homomorphism from D\ to D 2 is a smoothing homomorphism. 
Hence, by Lemma [5. 101 the process denoted by D 2 can be transformed using the sequence rule applied a 
finite number of times into the process denoted by D\. 



D 2 : (Turner loc Tate) (Turner loc Tate) (Turner loc London) (\Turner\loc London) 




( Turner loc Baltic ) ^Turner loc Baltw\) ( Turner loc UK) 



(Turner loc Tate) 



Thereby the existence of the interaction homomorphism between D\ and Do guarantees the existence of 
a deduction from the process denoted by Do to the process denoted by D\ using the interact and sequence 
rules. Indeed the processes and deductions are presented in the example at the end of Sec. |H where the 
first process in the example is denoted by D\, the second by D 2 and the third by Do- 

Every process can be written in a normal form, using the structural congruence, as a sum of series- 
parallel process with all the existential quantification moved to the front of the process, i.e. for all P there 
exist series-parallel processes A,- such that P = 3j?.S, g /A,-. It is then easy to show that a finite number 
of choice and exists rules can be applied to prove any inequality between ideals. This establishes the 
completeness of the calculus with respect to the denotation, stated as follows. 

Theorem 5.12 (completeness). If§y,P\ c §v ,Q\ for all valuations v, then P yields Q. 

Thus the model based on ideals of labelled series-parallel DAGs is a sound and complete model of 
processes. The labelled DAGs are inspired by the guidelines provided for provenance diagrams |[T5l : 
while, the series-parallel processes are motivated by calculi which model systems which produce prove- 
nance diagrams. Hence a formal connection between series-parallel DAGs and processes is established. 
Specifically, provenance diagrams are the denotation of series-parallel processes consisting of only arte- 
facts and stored data. Hence provenance diagrams are contained within a denotation for a provenance 
tracking calculus. Due to soundness and completeness of the calculus with respect to the denotation, 
provenance diagrams can be considered in a new operational language based setting. 
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6 Conclusion 

Provenance is a key problem in processing data which is particularly important in systems that publish 
data on the Web, such as the Web of Linked Data (HE]]. Already certain aspects of provenance are 
gifted with deep theoretical results |[T0l . However, there is no sound and complete model for the as- 
pects of provenance tracking considered in this work: specifically "how" provenance which indicates 
causal relationships; and a provenance tracking calculus which produces such diagrams by recording 
interactions between processes and stored data. The relationship between the diagrams and the calculus 
is exhibited by providing a sound and complete denotational semantics which contains such provenance 
diagrams. 

The examples presented in this paper illustrate that tracking provenance is particularly challenging 
in a concurrent setting. The causal aspects of data provenance are closely related to the operational se- 
mantics of the systems involved. Hence when considering concurrent systems, models of concurrency 
provide insight into problems associated with provenance in a concurrent setting. For instance, this 
work demonstrates that provenance diagrams that arise from concurrent interactions form series-parallel 
DAGs. Consequently, certain graph homomoiphism problems, which can be employed to query prove- 
nance diagrams, can be solved more efficiently for series-parallel digraphs lfl9l . This model is proposed 
as a foundation for "how" provenance, which can be applied as a subjective measure of the quality of 
data. 
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